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Abstract 

The Gaussian theory of errors has been generahzed to situations, 
where the Gaussian distribution and, hence, the Gaussian rules of error 
propagation are inadequate. The generalizations are based on Bayes' 
theorem and a suitable measure. The following text sketches some 
chapters of a monograph [] that is presently prepared. We concentrate 
on the material that is — to the best of our knowledge — not yet in the 
statistical literature. See especially the extension of form invariance to 
discrete data in section 4, the criterion on the compatibility between a 
proposed distribution and sparse data in section 7 and the "discovery" 
of probability amplitudes in section 9. 



1 The Prior Distribution 

Bayes' theorem Q allows one to deduce the distribution P{S^\x) of the pa- 
rameter conditioned by the data x. The distribution p{x\S,) of the data 
conditioned by the parameter ^ must be given. The theorem reads 

P{C\x)m{x) = p{x\Om (1) 
mix) = fd^p{x\OKO- (2) 



See e.g. [0. Here, fj,{^) is called the prior and P the posterior distribution 
of ^. The posterior can be used to deduce an interval / of error: We define 
it as the smallest interval in which ^ is with probability /C. This is called 
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the Bayesian interval / = /(/C). In order to make it independent of any 
reparametrisation r] = T(^), one has to judge the size A of an interval / by 
help of a measure /u(^), i.e. 

A = J^d^ fiiO- (3) 
We identify this measure with the prior distribution of 



2 Form Invariance 

Ideally the conditional distribution possesses a symmetry called form 

invariance. This family of distributions then emerges by a mathematical 
group of transformations G^x from one and the same basic distribution w, 
i.e. 

p{x\$,)dx = w{G^ x)dG^ X. (4) 

It is not required that every acceptable p has this symmetry. But the sym- 
metry guarantees an unbiased inference in the sense of section 3. If there is 
no form invariance, unbiased inference can be achieved only approximately. 

The prior distribution is defined as the invariant measure of the group 
of transformations. Symmetry arguments were first discussed in ^, ^. 

They were not generally accepted because not all reasonable distributions 
possess the symmetry (^. It cannot exist at all if x is discrete. Since ^ is 
assumed to be continuous, it can be changed infinitesimally. However, no 
infinitesimal transformation of a discrete variable is possible. In section 4, 
we generalize form invariance to this case. 

Form invariance is a property of ideal, well behaved distributions. How- 
ever, its existence is not a prerequisite of statistical inference, see section 
6. 

The invariant measure can be found from p — without analysis of the 
group — by evaluating the expression 

oc det ( / dx p{x\C)d^L djL j . (5) 

Here, the function L is 

L(0 = Inp(xie) (6) 

and d^LdjL means the dyadic product of the vector d^L of partial deriva- 
tives with itself. Eq.(|5|) is known as Jeffreys' rule 0. 

One shall see in section 6 that this expression defines fi in any case that 
is to say in the absence of form invariance, too. 
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3 Invariance of the Entropy of the Posterior Dis- 
tribution 



The posterior distribution has the same symmetry as the conditional 

distribution p{x\S,) if form invariance exists. The entropy 

H{x) = - j dxP{i\x)\n^^ (7) 

is then independent of the true value ^ of the parameter ^ because one has 

H{x)=H{Gpx) (8) 

for every transformation Gp of the symmetry group. This entails that H{x) 
does not depend on ^ but only on the number of the data xi . . .xn- One 
can say that all values of the parameter ^ are equally difficult to measure. 
In this sense, form invariance guarantees unbiased estimation of ^ and by 
the same token the invariant measure (j. is the parametrization of ignorance 
about ^. 



4 Form Invariance for Discrete x 

If the variable x is discrete — e.g. a number of counts — then form invariance 
cannot exist in the sense of eq. (Q) since an infinitesimal shift of ^ cannot be 
compensated by an infinitesimal transformation of x. One then has to define 
a vector a(^) the components of which are labelled by x. The probability 
p{x\^) must be a unique function of ax{i). Form invariance then means that 

a{i) = G^a{i = Q) . (9) 

Again is the invariant measure of the group. The transformation G^ shall 
be linear so that it is the linear representation of the symmetry group of 
form invariance. It is necessarily unitary. 

The choice ax{C) = p{x\^) is precluded because a group of transforma- 
tions cannot — for all of its elements — map a vector with positive elements 
onto one with the same property. With the choice 

axii) = ^P{x\i) (10) 

one succeeds. That means: Important discrete distributions — such as the 
Poisson and the binomial distributions — possess form invariance. Further- 
more the property (Q) can be recast into a relation corresponding to eq.(^, 
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i.e. it can be written as a linear transformation of the space of functions 
{pi^lOy^"^ ■ Hence, (^) is not different from (Q); it is a generalization. 

Note that (0) is a probability amplitude as it is used in quantum me- 
chanics. However, it is real up to this point. The generalization to complex 
probability amplitudes is sketched in section 8. 



5 The Poisson Distribution 

Form invariance in the sense of section 4 does not seem to have been treated 
in the literature on statistics. As an example let us consider the Poisson 
distribution 

Pix\S.) = exp(-A) 

X = 0,1,2... (11) 

With 

e = Ai/2 (12) 



one obtains the amplitudes 



a,(0 = 4^ exp(-eV2). (13) 
V 

The derivative of a is found to be 

^a(0 = (A+-A)a(e), (14) 

where A, are linear operators independent of ^. They have the commu- 
tator 

[A,A+] = l. (15) 

Hence, A, A~^ are destruction and creation operators of numbers of counts 
or events. Integrating the differential equation (|l^ one finds 

a{O=exp{C{A+-A))\0). (16) 

Here, the vacuum |0) is the vector that provides zero counts with probability 
1. Equation (|l|) means that the linear transformation is 

G^=exp{C{A+-A)). (17) 

The measure /i of this group of transformations is 

//(O = const. (18) 
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It can also be obtained by straightforward application of Jeffreys' rule (^) 
without analysis of the symmetry group. 

This can be generalized to the joint Poisson distribution 

p{xi...XM\il...iM) = n^exp(-ei) (19) 

of the numbers Xk of counts in a histogram with M bins. One finds the 
amplitude vector 

a(6 . . .6/) = exp (E^^(^fc - ^k)) |0) (20) 

and again the uniform measure /i(^) = const. 

As a further generalization, one can introduce destruction and creation 
operators By, of quasi-events v = 1 . . .n via 

M 

B, = Y,CkuAk. (21) 

k=l 

If the vectors \cy) for v = \ . . .n are orthonormal then 

[B,,B+] = 5yy>, (22) 

whence By,B^ are destruction and creation operators. One finds the am- 
plitude vector 

aiO = exp i^u - Bu)^ |0) (23) 

The amplitude to find the event x is given by 



M 



k=l V^^fc- \ u 



(24) 



Here, the amplitude 



n 

Sfc = X! ^"'^ku (25) 



to find events in the k-th bin is given by an expansion into the orthogonal 
system of amplitude vectors \cu). More precisely: By working with the 
creation operators -B^j", one infers an expansion of the vector |H) in terms of 
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the orthogonal system \cy). The prior distribution of the amphtudes is 
again uniform, 

/"(Ci • • • '^!^) = const. (26) 

On Summary: The problem of finding the expansion coefficients from the 
counting rates is form invariant and thus guarantees unbiased inference. 
One should therefore expand probability amplitudes and not probabilities 
in terms of an orthogonal system if one performs e.g. a Fourier analysis. 

6 The Prior Probability in the Absence of Form 
Invariance 

Jeffreys' rule (^) can be rewritten in the form 

oc det (^j dxd^adja^ ' . (27) 

The integral means a summation if x is discrete. 

In differential geometry |9| , it is shown that (|27|) is the measure on the 
surface defined by the parametrisation a(i^). A prerequisite for this measure 
is the assumption that one has the same uniform measure on each coordinate 
axis in the space; more precisely, the metric tensor of the space must be 
proportional to the unit matrix. Since the coordinates ax are probability 
amplitudes, this is justified by the last result of section 5. 

Hence, Jeffreys' rule provides the prior distribution in any case. In the 
absence of form invariance, however, one cannot guarantee that all values of 
the parameter ^ are equally difficult to measure, i.e. one cannot guarantee 
unbiased inference. 

7 Does a Proposed Distribution Fit an Observed 
Histogram? 

The Poisson distribution (ll9| ) yields the posterior 

M 

P{ii . . . ^Mki . . . xm) oc n €f ' exp(-d) (28) 

fc=i 

We want to decide whether — in the light of the data — the proposal is 
a reasonable estimate of ^fc, A; = 1 . . . M. This is equivalent to the question 
whether r is in the Bayesian Interval / = /(/C). The Bayesian interval is 
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bordered by the "contour line" r(/C) which is — in the case at hand — 
defined as the set of points with the property -P(^|x) = C(/C). This means 
that T £ I exactly if 

P{t\x) > C(/C) (29) 

or that r is accepted if and only if (]29| ) holds. The number C(/C) can be 
calculated. 

If the count rates Xk are large in every bin k, the procedure essentially 
yields the well-known x^-criterion. 

If, however, M > N = x^, i.e. if the data are sparse, then this leads 
to the condition 

ln(l + ^)+iV-^/'f-H/C). (30) 

Here, is the inverse of the probability function. Note that the expression 
in brackets (. . .) on the l.h.s. is > if 

E^l = i- (31) 

k 

Hence, the inequality ( |30|) sets an upper limit to a positive expression. This 
criterion is new. It is needed because the situation M > is surely met if 
A; is a multidimensional variable i.e. if the observable is multidimensional. 
See [|lO|. Any attempt to apply Gaussian arguments is hopeless in this case. 

8 Does a Proposed Probability Density Fit Ob- 
served Data? 

Suppose that the data xi . . . xjy have been observed. Each x^ is supposed 
to follow, say, an exponential distribution 

p(x|0 = r'exp(-x/0 . (32) 

They shall all be conditioned by one and the same hypothesis parameter 
If this is true, the posterior P(^|xi . . . xn) yields the distribution of ^ and, 
hence, the Bayesian interval for ^. It is intuitively clear that — at least for 
large — one can learn from the data not only the best fitting values of 
but one can even decide whether the exponential ( p2| ) is justified at all. I.e. 
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one can find out whether the model is satisfactory. How does this work? 
We do not want to produce a histogram by binning the data. This would 
reduce the problem to the one solved in section 7 but it would introduce an 
arbitrary element into the decision: The definition of the bins. 

The basic idea is to determine ^ from every data point, i.e. times, 
and to decide whether this result is compatible with ^ having the same value 
everywhere. 

One defines the distribution q of the A^-dimensional event [xi-.-xm) 
conditioned by the iV-dimensional hypothesis (Ci • • • ^n) as the product 

N 

q{xi . . . xatI^i . . . ^at) = p{xk\ik) ■ (33) 

k=l 

One writes down the posterior distribution Q(^i . . . ^Ar|xi . . . xat) of the A^- 
dimensional hypothesis (^i . . .^n)- One studies its Bayesian interval I{KL). 
A proposed hypothesis (n . . . r^v) is acceptable exactly if it is an element of 
/. In the case at hand, one determines the best value a of the hypothesis 
^ from the model that assigns one and the same hypothesis to all the data. 
One then asks whether the A^-dimensional r with = a for all k is in /. 

The criterion (|30| ) has been derived by help of this argument. 

Note, however, that the argument fails, when one wants to know whether 
the data (xi . . . xn) follow the proposed distribution t{x). There is no hy- 
pothesis ^. The family of distributions is not defined from which t{x) is 
taken. Indeed the above argument does not judge the distribution p{x\a) 
all by itself. It actually judges whether the family of distributions, i.e. the 
whole model p{x\£,), is compatible with the data. The question whether t{x) 
fits the data, is too general to be answered. One must specify which features 
of the distribution are important — its form in a region, where one finds 
most events or in a region where there are very few events? The relevant 
features are expressed by the parametric dependence on ^ and the measure 
derived from it. 

9 The Logic of Quantum Mechanics 

The results of section 5 show that probability amplitudes rather than prob- 
abilities can be inferred in an unbiased way from counting events. Alterna- 
tives ui are defined by two vectors \cy) and \ci„). Each vector characterizes 
a distribution over the bins A; = 1 ... M of a histogram. A decision between 
v and vi amounts to assess the amplitudes and ^^i. They determine 
the strength with which the distributions v and vi are present in the data. 
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However, the amplitudes can interfere — the probabilities cannot. The real 
amplitudes introduced so far can be generalized to complex ones: We arrive 
at the quantum mechanical way to treat alternatives. 

The parameters ^ deduced from counting events are then completely 
analogous with quantum mechanical probability amplitudes. It may be bet- 
ter to turn this statement around and to say: The logic of quantum me- 
chanics is the logic of unbiased inference from random events; it is not a 
collection of the rules according to which the microworld "exists" . 

The generalization of real amplitudes to complex ones is achieved by 
generalizing the amplitude vector (23) to 



a(^,C,</') = exp f^Z)J |0), 



where the operator D^, is 

Du = UBy + Bt) + iUBy - B+) + (j), . 



(34) 



(35) 



Here, the three generators do generate a group since one has the commutator 

[B,-B+,B, + B+]=2. (36) 

The invariant measure is 

C) </*) = const. (37) 
By explicit evaluation of eq. (|3^ one finds 

n \ 



M 

n 

k=l 



1 



'Xk\ 



\u=l 



exp --^(e^ + C'-2i(A.) 



(38) 



This is a generalization of expression (p4[). It is again a Poisson distribution, 
but now the amplitude to find events in the k-th bin is 



(39) 



v=l 



This is an expansion of the probability amplitude in terms of the system 
of mutually orthogonal vectors |c*) which may be complex. The expansion 
coefficients + iC,^ may be complex, too. 
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The phase J2u 4'v that appears in ( p8[ ) cannot be measured since only 
the modulus of (|38|) is accessible. 

The Poisson distribution possesses form invariance with respect to the 
probability amplitudes even if these are complex. Put differently, one should 
expand the square root of a distribution into a system of orthonormal vec- 
tors. They may be complex. The expansion coefficients deduced from the 
data may also be complex. Inference on the real and imaginary parts of 
the expansion coefficients is unbiased. The Fourier expansion is an example; 
however, it must be the square root of the probability distribution that is 
expanded. 



10 Alternatives that cannot Interfere 

In quantum physics alternatives can interfere. Suppose that a cross section 
a = (t{E) is observed as a function of energy E — e.g. in neutron scattering 
by heavy nuclei. Suppose that this excitation function shows a resonance 



line plus a smooth background. The book [11 1 is full of examples. Look e.g at 
the middle part of page 691. There is a flat background with superimposed 
resonances. The resonance lines destructively and constructively interfere 
with the background. 

Speaking in the language of section 5, the figure offers a simple alter- 
native V = 1,2. The first possibility [v = 1) is that the incoming neutron 
together with the target forms a compound system which decays after some 
time. The second possibility {v = 2) is the reaction to occur without de- 
lay. The probability amplitudes + iC,u for these two possibilities interfere. 
The interference pattern is visible if the resolution of the detection system 
is better than the width of the resonance. If the resolution is much worse, 
the interference pattern disappears and the cross section due to the reso- 
nance is added to the cross section due to the background, i.e. one adds the 
probabilities t^u = + Cu instead of the amplitudes. 

The situation of insufficient resolution is the situation of classical physics 
and classical statistics: Alternatives do not interfere. Their probabilities are 
added. 

The typical situation of classical physics is that the detection system 
lumps many events together that have distinguishable properties. In our 
example: It does not well enough discriminate the energies of the scattered 
particles. The events recorded in classical physics can in principle be differ- 
entiated according to more properties than are actually used to distinguish 
them. The tacit assumption of classical physics was that this were always 
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so. 

If objects are observed that ahow for a smah number of distinctions 
only, one is lead to the logic of interfering probability amplitudes by the 
way sketched in sections 5 and 9. 

Consider the two slit experiment as a further example. If it is performed 
with polarized electrons, an impressive interference pattern appears. Use of 
unpolarized electrons reduces the contrast of the pattern. Had the scattered 
particles more than two "ways to be" , the contrast of the interference would 
be reduced up to the point, where the probability of a particle going through 
the first slit would be added to the probability of the particle going through 
the second slit. See chapter 1 of |T^ . 

Suppose that we know that there is interference between the two possi- 
bilities in the above neutron scattering experiment. The amplitudes ^,u + 'i'Cu 
for the possibilities u = 1,2 would be inferred from the data Xi . . • Xf^ as 
follows. The distribution of the data is 

A/ jj^Xfe 

p{xi . . . :Eiv|eiCi6C2) = n ^ exp(-Afe) , (40) 

where the expectation value in the k-th bin is a function of £,u,Cu, namely 

Xk = m+iCi)Line{k) + {^2 + iC2)Bg{k)f . (41) 

Here, Line{k) is the line shape and Bg{k) is the shape of the background. 
By section 9, this is a form invariant model allowing for unbiased inference. 

Suppose on the contrary that there cannot be any interference between 
the two possibilities in the neutron experiment. The probabilities vri and 7r2 
are inferred via the model p{xi . . .^v |7ri7r2) which is again given by eq. (pO|) . 
But now Afc is the incoherent sum 

Afc = ^i|Lme(A;)|2 + 7r2|B5(A:)|2 . (42) 

The prior distribution for this model must be calculated by help of (|5|). The 
model is not form invariant, whence unbiased inference cannot be guaran- 
teed. A closer inspection shows that the model "has a prejudice against" 
very small values of tti or 7r2. This means: Small values are harder to 
establish than large ones. 

11 Summary 

The basis of the foregoing work is twofold: (i) All statements and relations 
in statistical inference must be invariant under reparametrizations and (ii) 
to state ignorance about ^ means to claim a symmetry. 
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It is the symmetry of form invariance that guarantees unbiased infer- 
ence of the hypothesis ^, if the invariant measure of the symmetry group is 
identified with the prior distribution in Bayesian inference. The invariant 
measure is obtained in a straightforward way — i.e. without analysis of 
the group — by Jeffreys' rule. We have shown that even distributions of 
counted numbers possess form invariance. 

A study of the Poisson distribution shows that the basic quantities in 
statistical inference are probability amplitudes not probabilities. The am- 
plitudes may even be complex. This is not only an analogy to the logic of 
quantum mechanics. This says that the logic of quantum mechanics is the 
logic of unbiased inference from counted events. 

These considerations do not mean that form invariance is a condition 
for the possibility of inference. Lack of form invariance precludes unbiased 
inference; it does not preclude inference. In the absence of form invariance, 
the prior distribution is defined as the differential geometrical measure on 
a suitably defined surface: The surface must lie in a space of probability 
amplitudes. The measure on the surface is again given by Jeffreys' rule. 

As a practically useful result, we have formulated the decision whether 
a proposed distribution fits an observed histogram. The decision covers the 
case of sparse data. This case does not allow a Gaussian approximation and, 
hence, no x^-test. 
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